Resolving Surface Forms to Wikipedia Topics
نویسندگان
چکیده
Ambiguity of entity mentions and concept references is a challenge to mining text beyond surface-level keywords. We describe an effective method of disambiguating surface forms and resolving them to Wikipedia entities and concepts. Our method employs an extensive set of features mined from Wikipedia and other large data sources, and combines the features using a machine learning approach with automatically generated training data. Based on a manually labeled evaluation set containing over 1000 news articles, our resolution model has 85% precision and 87.8% recall. The performance is significantly better than three baselines based on traditional context similarities or sense commonness measurements. Our method can be applied to other languages and scales well to new entities and concepts.
منابع مشابه
Predicting and Identifying Hypertext in Wikipedia Articles
1. Ratinov, Roth, Downey, and Anderson. Local and Global Algorithms for Disambiguation to Wikipedia. (University of Illinois at Urbana-Champaign). Retrieved from http://web.eecs.umich.edu/~mrander/pubs/RatinovDoRo.pdf 2. Zhou, Nie, Rouhani-Kalleh, Vasile, and Gaffney. Resolving surface forms to Wikipedia topics. (ACM Digital Library). Retrieved from http://dl.acm.org/citation.cfm?id=1873931 3. ...
متن کاملGathering Alternative Surface Forms for DBpedia Entities
Wikipedia is often used a source of surface forms, or alternative reference strings for an entity, required for entity linking, disambiguation or coreference resolution tasks. Surface forms have been extracted in a number of works from Wikipedia labels, redirects, disambiguations and anchor texts of internal Wikipedia links, which we complement with anchor texts of external Wikipedia links from...
متن کاملUnsupervised Name Ambiguity Resolution Using A Generative Model
Resolving ambiguity associated with names found on the Web, Wikipedia or medical texts is a very challenging task, which has been of great interest to the research community. We propose a novel approach to disambiguating names using Latent Dirichlet Allocation, where the learned topics represent the underlying senses of the ambiguous name. We conduct a detailed evaluation on multiple data sets ...
متن کاملTopical Generalization for Presentation of User Profiles
Fine-grained user profile generation approaches have made it increasingly feasible to display on a profile page in which topics a user has expertise or interest. Earlier work on topical user profiling has been directed at enhancing search and personalization functionality, but making such profiles useful for human consumption presents new challenges. With this work, we have taken a first step t...
متن کاملTowards Supporting Exploratory Search over the Arabic Web Content: The Case of ArabXplore
Due to the huge amount of data published on the Web, the Web search process has become more difficult, and it is sometimes hard to get the expected results, especially when the users are less certain about their information needs. Several efforts have been proposed to support exploratory search on the web by using query expansion, faceted search, or supplementary information extracted from exte...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2010